Marginalized kernels for RNA sequence data analysis.
نویسندگان
چکیده
We present novel kernels that measure similarity of two RNA sequences, taking account of their secondary structures. Two types of kernels are presented. One is for RNA sequences with known secondary structures, the other for those without known secondary structures. The latter employs stochastic context-free grammar (SCFG) for estimating the secondary structure. We call the latter the marginalized count kernel (MCK). We show computational experiments for MCK using 74 sets of human tRNA sequence data: (i) kernel principal component analysis (PCA) for visualizing tRNA similarities, (ii) supervised classification with support vector machines (SVMs). Both types of experiment show promising results for MCKs.
منابع مشابه
Marginalized kernels for biological sequences
MOTIVATION Kernel methods such as support vector machines require a kernel function between objects to be defined a priori. Several works have been done to derive kernels from probability distributions, e.g., the Fisher kernel. However, a general methodology to design a kernel is not fully developed. RESULTS We propose a reasonable way of designing a kernel when objects are generated from lat...
متن کاملKernels for small molecules
2 Graph kernels 2 2.1 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 2.2 The Spectrum Kernel . . . . . . . . . . . . . . . . . . . . . . . . 2 2.3 The Tanimoto Kernel . . . . . . . . . . . . . . . . . . . . . . . . 3 2.4 The MinMax Kernel . . . . . . . . . . . . . . . . . . . . . . . . . 3 2.5 The Marginalized Kernel . . . . . . . . . . . . . . . . . . . . . . . 3 2.6 ...
متن کاملA comparative phylogenetic analysis of Theileria spp. by using two two "18S ribosomal RNA" and "Theileria annulata merozoite surface antigen" gene sequences
More than 185 species, strains and unclassified Theileria parasites are categorized in the Entrez Taxonomy. The accurate diagnosis and proper identification of the causative agents are important for understanding the epidemiology, prevention and appropriate treatment. This study aims to discuss the importance of two genes of Theileria annulata 18S ribosomal RNA (18S rRNA) and Theileria annulata...
متن کاملBinet-Cauchy Kernels
We propose a family of kernels based on the Binet-Cauchy theorem and its extension to Fredholm operators. This includes as special cases all currently known kernels derived from the behavioral framework, diffusion processes, marginalized kernels, kernels on graphs, and the kernels on sets arising from the subspace angle approach. Many of these kernels can be seen as the extrema of a new continu...
متن کاملA Randomized String Kernel and Its Application to RNA Interference
String kernels directly model sequence similarities without the necessity of extracting numerical features in a vector space. Since they better capture complex traits in the sequences, string kernels often achieve better prediction performance. RNA interference is an important biological mechanism with many therapeutical applications, where strings can be used to represent target messenger RNAs...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Genome informatics. International Conference on Genome Informatics
دوره 13 شماره
صفحات -
تاریخ انتشار 2002